Distributed language representation for authorship attribution
نویسندگان
چکیده
منابع مشابه
Cross-Language Authorship Attribution
This paper presents a novel task of cross-language authorship attribution (CLAA), an extension of authorship attribution task to multilingual settings: given data labelled with authors in language X , the objective is to determine the author of a document written in language Y , where X 6= Y . We propose a number of cross-language stylometric features for the task of CLAA, such as those based o...
متن کاملAuthorship Attribution in Bengali Language
We describe Authorship Attribution of Bengali literary text. Our contributions include a new corpus of 3,000 passages written by three Bengali authors, an end-toend system for authorship classification based on character n-grams, feature selection for authorship attribution, feature ranking and analysis, and learning curve to assess the relationship between amount of training data and test accu...
متن کاملAuthorship Attribution Using the Chaos Game Representation
The Chaos Game Representation, a method for creating images from nucleotide sequences, is modified to make images from chunks of text documents. Machine learning methods are then applied to train classifiers based on authorship. Experiments are conducted on several benchmark data sets in English, including the widely used Federalist Papers, and one in Portuguese. Validation results for the trai...
متن کاملApplication of the distributed document representation in the authorship attribution task for small corpora
Distributed word representation in a vector space (word embeddings) is a novel technique that allows to represent words in terms of the elements in the neighborhood. Distributed representations can be extended to larger language structures like phrases, sentences, paragraphs, and documents. The capability to encode semantic information of texts and the ability to handle high dimensional data se...
متن کاملA New Document Author Representation for Authorship Attribution
This paper proposes a novel representation for Authorship Attribution (AA), based on Concise Semantic Analysis (CSA), which has been successfully used in Text Categorization (TC). Our approach for AA, called Document Author Representation (DAR), builds document vectors in a space of authors, calculating the relationship between textual features and authors. In order to evaluate our approach, we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Digital Scholarship in the Humanities
سال: 2017
ISSN: 2055-7671,2055-768X
DOI: 10.1093/llc/fqx046